perf: Boost decoding throughput and eliminate payload buffer allocations in core client#4195
perf: Boost decoding throughput and eliminate payload buffer allocations in core client#4195merchantmoh-debug wants to merge 1 commit intogoogle:masterfrom
Conversation
|
Force-pushed an architectural update to correct a memory scaling issue. While the previous commit successfully eliminated capacity-doubling allocations on the Request (encoding) side, using This updated commit introduces Symmetrical Pooling: This successfully combines the raw CPU parsing speed of contiguous slice scanning ( |
|
This is very interesting, @merchantmoh-debug - could you please add a benchmark test so that we can see the performance numbers before and after this change? |
6a8afb5 to
13a5e18
Compare
|
Update on CI Failure & Isolation of Concerns: The CI is currently failing on this PR. This failure is a feature, not a bug. The shift to Symmetrical Pooling uses the stricter Rather than hiding a test-suite bug fix inside a performance PR (which entangles concerns and complicates potential rollbacks), I have completely decoupled them. I have opened PR #4197 to fix the broken tests atomically. Once that bug fix is merged upstream, I will rebase this PR and it will pass CI cleanly. In the meantime, I've also pushed a comprehensive side-by-side benchmark suite ( |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4195 +/- ##
==========================================
- Coverage 93.71% 93.69% -0.03%
==========================================
Files 209 209
Lines 19770 19785 +15
==========================================
+ Hits 18527 18537 +10
- Misses 1046 1048 +2
- Partials 197 200 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@alexandear - Looks good? |
|
@merchantmoh-debug - just FYI... I'm trying to pull this PR and run the benchmarks locally so that we can add the "before/after" numbers to this PR to document the changes. |
|
I'm also seeing if I can improve the code coverage locally. |
|
Here are the results from running the benchmarks: |
Awesome, thanks Glenn. These numbers perfectly validate the core theory behind the change. The DecodeResponse benchmark is the standout. For a 500KB payload, the legacy By switching to Symmetrical Pooling with At scale, this completely eliminates the geometric GC pressure when parsing large GitHub API payloads (like massive list queries or repository manifests). |
There was a problem hiding this comment.
OK, so now that we have the benchmark numbers, and decoding has improved but encoding has regressed, I'm now trying to understand the consequences these changes will have to the thousands of clients that use this repo, if any.
First, we are introducing a new internal global variable for the buffer pool.
Second, we are increasing the internal complexity of NewRequest (which would arguably affect the maintainers more than the end users unless bugs are being introduced which, as far as I can tell, is not the case here).
I'm wondering if a decoding improvement like this might be better provided by clients of this library who might already have their own buffer pool already running locally? If so, then would it be possible to provide them hooks for using them (instead of forcing this upon all users)?
Not everyone using this client library is going to need high performance decoding for high volume requests, as GitHub rate limiting sees to it that performance is limited.
So I'm just trying to figure out the benefits here to the end user and if these changes are generally useful across the board.
I would love to hear thoughts from our other dedicated reviewers:
cc: @stevehipwell - @alexandear - @zyfy29 - @Not-Dhananjay-Mishra - @munlicode
|
@gmlewis Glenn, thank you for the thoughtful review and for running the benchmarks. Your intuition about Here is where I landed: 1. You are right about
|
That makes perfect sense to me, @merchantmoh-debug - let's please run with that. |
|
@gmlewis Glenn Done. I just pushed an update that strips the pooling logic out of I've pushed the simplified PR that strictly focuses on the |
|
@merchantmoh-debug - just for fun, I asked a local agent to improve the code coverage for diff --git a/github/github_test.go b/github/github_test.go
index b41ebd8f..97897204 100644
--- a/github/github_test.go
+++ b/github/github_test.go
@@ -6,6 +6,7 @@
package github
import (
+ "bytes"
"context"
"encoding/json"
"errors"
@@ -2222,6 +2223,109 @@ func TestDo_noContent(t *testing.T) {
}
}
+func TestDo_ioWriter(t *testing.T) {
+ t.Parallel()
+ client, mux, _ := setup(t)
+
+ mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
+ testMethod(t, r, "GET")
+ fmt.Fprint(w, "hello world")
+ })
+
+ req, _ := client.NewRequest(t.Context(), "GET", ".", nil)
+ var buf bytes.Buffer
+ _, err := client.Do(req, &buf)
+ assertNilError(t, err)
+
+ if buf.String() != "hello world" {
+ t.Errorf("Response body = %q, want %q", buf.String(), "hello world")
+ }
+}
+
+func TestDo_ioWriter_error(t *testing.T) {
+ t.Parallel()
+ client, mux, _ := setup(t)
+
+ mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
+ testMethod(t, r, "GET")
+ fmt.Fprint(w, "hello world")
+ })
+
+ req, _ := client.NewRequest(t.Context(), "GET", ".", nil)
+ w := &writerThatErrors{}
+ _, err := client.Do(req, w)
+ if err == nil {
+ t.Fatal("Expected error from io.Writer, got none")
+ }
+}
+
+type writerThatErrors struct{}
+
+func (w *writerThatErrors) Write(p []byte) (int, error) {
+ return 0, errors.New("write error")
+}
+
+func TestDo_invalidJSON(t *testing.T) {
+ t.Parallel()
+ client, mux, _ := setup(t)
+
+ type foo struct {
+ A string
+ }
+
+ mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
+ testMethod(t, r, "GET")
+ fmt.Fprint(w, "not valid json")
+ })
+
+ req, _ := client.NewRequest(t.Context(), "GET", ".", nil)
+ var body foo
+ _, err := client.Do(req, &body)
+ if err == nil {
+ t.Fatal("Expected JSON decode error, got none")
+ }
+}
+
+func TestDo_whitespaceOnlyBody(t *testing.T) {
+ t.Parallel()
+ client, mux, _ := setup(t)
+
+ type foo struct {
+ A string
+ }
+
+ mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
+ testMethod(t, r, "GET")
+ fmt.Fprint(w, " \n\t ")
+ })
+
+ req, _ := client.NewRequest(t.Context(), "GET", ".", nil)
+ var body foo
+ _, err := client.Do(req, &body)
+ assertNilError(t, err)
+
+ if body.A != "" {
+ t.Errorf("Response body = %v, want empty foo", body)
+ }
+}
+
+func TestDo_nilV_noop(t *testing.T) {
+ t.Parallel()
+ client, mux, _ := setup(t)
+
+ mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
+ testMethod(t, r, "GET")
+ fmt.Fprint(w, `{"A":"a"}`)
+ })
+
+ req, _ := client.NewRequest(t.Context(), "GET", ".", nil)
+ resp, err := client.Do(req, nil)
+ assertNilError(t, err)
+ if resp.StatusCode != http.StatusOK {
+ t.Errorf("Expected status %d, got %d", http.StatusOK, resp.StatusCode)
+ }
+}
+
func TestBareDoUntilFound_redirectLoop(t *testing.T) {
t.Parallel()
client, mux, _ := setup(t)Feel free to incorporate if you agree. |
alexandear
left a comment
There was a problem hiding this comment.
I'm not convinced we should ship this performance optimization yet. Nobody has reported this as a real-world pain point in issues.
I'd prefer to keep behavior simple unless we can point to concrete reports or production data showing this is a problem worth.
| b.ReportAllocs() | ||
| b.ResetTimer() | ||
| for i := 0; i < b.N; i++ { |
There was a problem hiding this comment.
I think we can the newest Go version for benchmaring code.
E.g., b.Loop is available in Go 1.24 https://pkg.go.dev/testing#B.Loop
There was a problem hiding this comment.
Benchmarks are not accurate. The legacy path uses io.ReadAll+json.Unmarshal, but the previous production code path in Client.Do was json.NewDecoder(resp.Body).Decode(v). This means the benchmark compares pooled-buffer decoding against an artificially slower baseline, not against the actual old behavior.
Also, both benchmarks use helper functions instead of running through Client.Do.
| // decode it. If v is nil, and no error happens, the response is returned as is. | ||
| // If rate limit is exceeded and reset time is in the future, Do returns | ||
| // *RateLimitError immediately without making a network API call. | ||
| func (c *Client) Do(req *http.Request, v any) (*Response, error) { |
There was a problem hiding this comment.
Tests currently do not cover the new pool-specific edge risks in this function.
E.g., no targeted test for very large body then small body reuse, and no guard test for memory-cap behavior with pooled buffers.
Description
This PR introduces two foundational performance optimizations to the core HTTP client (
github.go) aimed at maximizing throughput and eliminating garbage collection (GC) pressure for high-frequency API polling environments.1. Zero-Allocation Request Body Construction (
sync.Pool)Previously, every POST/PATCH/PUT request containing a JSON body dynamically allocated a new
*bytes.Bufferon the heap, triggering unpredictable slice resizing overhead during serialization.requestBufferPool. The JSON encoder now targets a pre-warmed pooled buffer, allowing the client to extract the exact[]byteslice needed for the payload before instantly returning the buffer to the pool.2. High-Throughput JSON Decoding (
io.ReadAll+json.Unmarshal)Previously,
Client.Doutilizedjson.NewDecoder(resp.Body).Decode(). While idiomatic for infinite streams, this mechanism performs excessive interface assertions and dynamic allocations, creating an artificial bottleneck for bounded REST payloads.io.ReadAllfollowed byjson.Unmarshal, leveraging fast-path contiguous byte slice scanning.EOF) or purely whitespace responses has been explicitly preserved to maintain strict 1:1 API compatibility with the legacyNewDecoderbehavior.Motivation
These optimizations are designed to harden the client for deployment in autonomous infrastructure
Verification
io.EOFfallback logic for empty responses remains functionally identical.enc.SetEscapeHTML(false)) during pooled encoding..golangci.ymlstrictly (shadowing disabled, no ineffectual assignments).